An Initial Analysis of Topic-based Similarity among Scientific Documents Based on their Rhetorical Discourse Parts

نویسندگان

  • Carlos Badenes-Olmedo
  • José Luis Redondo García
  • Óscar Corcho
چکیده

Summaries and abstracts of research papers have been traditionally used for many purposes by scientists, research practitioners, editors, programme committee members or reviewers (e.g. to identify relevant papers to read or publish, cite them, explore new fields and disciplines). As a result, many paper repositories only store or expose abstracts, what may limit the capacity of finding the right paper for a specific research purpose. Given the size limitations and the concise nature of abstracts, they usually omit explicit references to some contributions and impacts of the paper. Therefore for certain information retrieval tasks they cannot be considered as the most appropriate excerpt of the paper to base these operations on. In this paper we have studied other kinds of summaries, built upon textual fragments falling under certain categories of the scientific discourse, such as outcome, background, approach, etc, in order to decide which one is more appropriate in order to substitute the original text. In particular, two novel measures are proposed: (1) internalrepresentativeness, which evaluates how well a summary describes what the full-text is about and (2) external-representativeness, which evaluates the potential of a summary to discover related texts. Results suggest that summaries explaining the method of a scientific article express a more accurate description of the full-content than others. In addition, more relevant related articles are also discovered from summaries describing the method, together with those containing the background knowledge or the outcomes of the research paper.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybridity of Scientific Discourses: an Intertextual Perspective and Implications for ESP Pedagogy

In light of a large number of admirable attempts which look at scientific discourse from social, dialogic and interpersonal points of view, the propositions which consider scientific discourse as an interactive endeavor are now well-established. By the force of our social constructivist gyrations, we have developed glimpses of a social, cultural and historical dimension in which the discourse o...

متن کامل

Thai Rhetorical Structure Analysis

Rhetorical structure analysis (RSA) explores discourse relations among elementary discourse units (EDUs) in a text. It is very useful in many text processing tasks employing relationships among EDUs such as text understanding, summarization, and question-answering. Thai language with its distinctive linguistic characteristics requires a unique technique. This article proposes an approach for Th...

متن کامل

Genre Analysis of ELT and Nursing Academic Written Discourse through Introduction

Since Swales’ (1981, 1990) CARS model work on the move structure of research articles, studies on genre analysis have been carried out amongst which works on different parts of research articles in various disciplines has gained a considerable literature. This study aims to investigate the rhetorical structure of the Introduction sections of articles in two fields of English Language Teaching (...

متن کامل

The Effect of Cognitive Factors of Rhetorically Different Listening Tasks on L2 Listening Quality of Iranian Advanced EFL Learners

This study examined the effect of two different authentic topic-familiar rhetorical L2 listening tasks (expository and argumentative) differing in reasoning demand on the listening comprehension scores of a number of Iranian EFL advanced learners. Sixty homogeneous advanced learners were recruited based on their performance on an English Language Proficiency test (Fowler & Coe, 1976). Then they...

متن کامل

DFDS: A Domain-Independent Framework for Document-Level Sentiment Analysis Based on RST

Document-level sentiment analysis is among the most popular research fields of nature language processing in recent years, in which one of major challenges is that discourse structural information can be hardly captured by existing approaches. In this paper, a domain-independent framework for documentlevel sentiment classification with weighting rules based on Rhetorical Structure Theory is pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017